In this study, the ESIF VASP Benchmarks 1 and 2 were used. Benchmark 1 is a system of 16 atoms, and Benchmark 2 is a system of 519 atoms. Benchmark 2 was used to explore differences in runtimes between running on half-filled and full nodes on Swift anf Eagle as well as runtime improvements due to running on Eagle's GPU nodes. Since Benchmark 1 represents a smaller system and requires less computational time to run, all Benchmark 1 calculations were run using 4x4x2 and 10x10x5 kpoints grids in order to measure kpoints scaling and parallelization on Swift and Eagle. Additionally, Benchmark 1 was run with 8 different KPAR and NPAR configurations in the INCAR file in order to explore the efficiency of various VASP parallization schemes on Swift and Eagle. Both Benchmarks were used to compare runtimes between IntelMPI and OpenMPI and across various cpu-bind settings.
Average scaling rates between two parameter settings were calculated by first calculating the average runtime at every core count for each setting, calculating the percent difference between the two settings at each core count, and then averaging the set of percentages across all core counts. Since differences in runtime tend to be larger at higher core counts, this was done to give each core count equal weight in the overall average.
import numpy as np
import pandas as pd
import datetime as dt
from matplotlib import pyplot as plt
import os
import plotly.express as px
import plotly.io as io
# load formatting extension, not necessary
%load_ext nb_black
# read aggregate data file
data = pd.read_csv("aggregate_data.csv")
# view aggregate data
data
| Date | HPC System | Job ID | Partition | Benchmark Code | kpoints | math library | MPI | cpu-bind | Nodes | Cores | Nodelist | Energy | Electronic Steps | Runtime | KPAR | NPAR | node_fill | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Thu Jan 6 09:22:57 MST 2022 | Eagle | 8144861 | short | 1 | 10x10x5 | openmpi | openmpi | cores | 1 | 36 | r10i1n25 | -6.03E+01 | 13 | 0:09:54 | 1 | D | full |
| 1 | Thu Jan 6 09:12:43 MST 2022 | Eagle | 8144847 | short | 1 | 10x10x5 | mkl | intel_impi | cores | 1 | 36 | r5i0n1 | -6.03E+01 | 13 | 0:09:59 | 1 | D | full |
| 2 | Thu Jan 6 09:35:59 MST 2022 | Eagle | 8144917 | short | 1 | 10x10x5 | openmpi | openmpi | none | 1 | 36 | r10i0n16 | -6.03E+01 | 13 | 0:09:54 | 1 | D | full |
| 3 | Thu Jan 6 09:34:07 MST 2022 | Eagle | 8144903 | short | 1 | 10x10x5 | mkl | intel_impi | none | 1 | 36 | r4i2n4 | -6.03E+01 | 13 | 0:09:58 | 1 | D | full |
| 4 | Thu Jan 6 09:29:02 MST 2022 | Eagle | 8144889 | short | 1 | 10x10x5 | openmpi | openmpi | rank | 1 | 36 | r4i6n32 | -6.03E+01 | 13 | 0:09:20 | 1 | D | full |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1567 | Thu Jan 20 22:06:03 MST 2022 | Eagle | 8183166 | short | 1 | 10x10x5 | openmpi | openmpi | none | 1 | 9 | r5i6n13 | -6.03E+01 | 15 | 0:22:00 | 9 | sqrt | partial |
| 1568 | Fri Dec 17 01:42:24 MST 2021 | Eagle | 8044011 | short | 2 | 1x1x1 | mkl | intel_impi | cores | 1 | 9 | r6i2n5 | -1.27E+03 | 36 | 4:53:27 | 1 | D | partial |
| 1569 | Fri Dec 17 03:16:01 MST 2021 | Eagle | 8044013 | short | 2 | 1x1x1 | mkl | intel_impi | none | 1 | 9 | r6i2n23 | -1.27E+03 | 36 | 5:56:45 | 1 | D | partial |
| 1570 | Fri Dec 17 03:12:33 MST 2021 | Eagle | 8044014 | short | 2 | 1x1x1 | mkl | intel_impi | rank | 1 | 9 | r6i7n28 | -1.27E+03 | 36 | 5:22:33 | 1 | D | partial |
| 1571 | Fri Dec 3 22:47:06 UTC 2021 | Vermillion | 50004336 | std | small | 4x4x4 | openmpi | openmpi | cores | 1 | 8 | vs-std-0003 | -3.93E+01 | 18 | 0:00:04 | 1 | D | partial |
1572 rows × 18 columns
# remove lines with calculations that have errors, which is marked in the energy column
# marked if no energy value is produced or if an error is found by inspecting raw data
data_valid = data[data.Energy != "error"]
# get time expressed in seconds
def get_seconds(row):
time_str = row["Runtime"]
baseline = "00:00:00"
read_time = dt.datetime.strptime(time_str, "%H:%M:%S")
read_baseline = dt.datetime.strptime(baseline, "%H:%M:%S")
total_time = read_time - read_baseline
seconds = total_time.total_seconds()
return seconds
data_valid["Runtime(s)"] = data_valid.apply(get_seconds, axis=1)
<ipython-input-1-463a550ca1a3>:12: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_valid["Runtime(s)"] = data_valid.apply(get_seconds, axis=1)
# create a new column with run time/num. electronic steps
def get_scaled_time(row):
time = row["Runtime(s)"]
elec_steps = row["Electronic Steps"]
time_per_step = float(time) / float(elec_steps)
return time_per_step
data_scaled_elec = data_valid
data_scaled_elec["time_per_step"] = data_valid.apply(get_scaled_time, axis=1)
<ipython-input-1-a625ee181dee>:10: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_scaled_elec["time_per_step"] = data_valid.apply(get_scaled_time, axis=1)
# create a new column that calculates a "rate" from the scaled time values calculated in the previous cell
def get_scaled_rate(row):
time_per_step = row["time_per_step"]
rate = 1.0 / float(time_per_step)
return rate
data_scaled_elec["scaled_rate"] = data_scaled_elec.apply(get_scaled_rate, axis=1)
<ipython-input-1-553b4b242b1a>:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_scaled_elec["scaled_rate"] = data_scaled_elec.apply(get_scaled_rate, axis=1)
# correct spelling inconsistencies
# only correcting intel_impi inconsistency and HPC System inconsistency here because these are the only one that has been causing issues so far.
# Haven't checked for others yet
def intel_impi_label(row):
if row["MPI"] == "intel-impi" or row["MPI"] == "intel_impi":
return "intel_impi"
if row["MPI"] == "intel_impi_clara" or row["MPI"] == "intel_impi-clara":
return "intel_impi_clara"
else:
return row["MPI"]
def HPC_sys_label(row):
if row["HPC System"] == "Swift" or row["HPC System"] == "swift":
return "Swift"
elif row["HPC System"] == "Eagle" or row["HPC System"] == "eagle":
return "Eagle"
if row["HPC System"] == "Vermillion" or row["HPC System"] == "vermillion":
return "Vermillion"
else:
return row["HPC System"]
data_valid["MPI"] = data_valid.apply(intel_impi_label, axis=1)
data_valid["HPC System"] = data_valid.apply(HPC_sys_label, axis=1)
<ipython-input-1-992d4abd197c>:24: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_valid["MPI"] = data_valid.apply(intel_impi_label, axis=1) <ipython-input-1-992d4abd197c>:25: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_valid["HPC System"] = data_valid.apply(HPC_sys_label, axis=1)
def get_processor(row):
partition = row["Partition"]
if "gpu" in partition:
return "GPU"
else:
return "CPU"
data_scaled_elec["Processor"] = data_scaled_elec.apply(get_processor, axis=1)
<ipython-input-1-4408d97578ae>:9: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_scaled_elec["Processor"] = data_scaled_elec.apply(get_processor, axis=1)
# view data scaled by number of electronic steps
data_scaled_elec
| Date | HPC System | Job ID | Partition | Benchmark Code | kpoints | math library | MPI | cpu-bind | Nodes | ... | Energy | Electronic Steps | Runtime | KPAR | NPAR | node_fill | Runtime(s) | time_per_step | scaled_rate | Processor | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Thu Jan 6 09:22:57 MST 2022 | Eagle | 8144861 | short | 1 | 10x10x5 | openmpi | openmpi | cores | 1 | ... | -6.03E+01 | 13 | 0:09:54 | 1 | D | full | 594.0 | 45.692308 | 0.021886 | CPU |
| 1 | Thu Jan 6 09:12:43 MST 2022 | Eagle | 8144847 | short | 1 | 10x10x5 | mkl | intel_impi | cores | 1 | ... | -6.03E+01 | 13 | 0:09:59 | 1 | D | full | 599.0 | 46.076923 | 0.021703 | CPU |
| 2 | Thu Jan 6 09:35:59 MST 2022 | Eagle | 8144917 | short | 1 | 10x10x5 | openmpi | openmpi | none | 1 | ... | -6.03E+01 | 13 | 0:09:54 | 1 | D | full | 594.0 | 45.692308 | 0.021886 | CPU |
| 3 | Thu Jan 6 09:34:07 MST 2022 | Eagle | 8144903 | short | 1 | 10x10x5 | mkl | intel_impi | none | 1 | ... | -6.03E+01 | 13 | 0:09:58 | 1 | D | full | 598.0 | 46.000000 | 0.021739 | CPU |
| 4 | Thu Jan 6 09:29:02 MST 2022 | Eagle | 8144889 | short | 1 | 10x10x5 | openmpi | openmpi | rank | 1 | ... | -6.03E+01 | 13 | 0:09:20 | 1 | D | full | 560.0 | 43.076923 | 0.023214 | CPU |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1567 | Thu Jan 20 22:06:03 MST 2022 | Eagle | 8183166 | short | 1 | 10x10x5 | openmpi | openmpi | none | 1 | ... | -6.03E+01 | 15 | 0:22:00 | 9 | sqrt | partial | 1320.0 | 88.000000 | 0.011364 | CPU |
| 1568 | Fri Dec 17 01:42:24 MST 2021 | Eagle | 8044011 | short | 2 | 1x1x1 | mkl | intel_impi | cores | 1 | ... | -1.27E+03 | 36 | 4:53:27 | 1 | D | partial | 17607.0 | 489.083333 | 0.002045 | CPU |
| 1569 | Fri Dec 17 03:16:01 MST 2021 | Eagle | 8044013 | short | 2 | 1x1x1 | mkl | intel_impi | none | 1 | ... | -1.27E+03 | 36 | 5:56:45 | 1 | D | partial | 21405.0 | 594.583333 | 0.001682 | CPU |
| 1570 | Fri Dec 17 03:12:33 MST 2021 | Eagle | 8044014 | short | 2 | 1x1x1 | mkl | intel_impi | rank | 1 | ... | -1.27E+03 | 36 | 5:22:33 | 1 | D | partial | 19353.0 | 537.583333 | 0.001860 | CPU |
| 1571 | Fri Dec 3 22:47:06 UTC 2021 | Vermillion | 50004336 | std | small | 4x4x4 | openmpi | openmpi | cores | 1 | ... | -3.93E+01 | 18 | 0:00:04 | 1 | D | partial | 4.0 | 0.222222 | 4.500000 | CPU |
1528 rows × 22 columns
# define a function that calculates performance time scaling two values of a given parameter for a data frame
def get_scaling(df, column, val1, val2):
kpoints_increase = []
df_val1 = df[df[column] == val1]
df_val2 = df[df[column] == val2]
for cores in np.unique(df_val1["Cores"]):
if cores in np.unique(df_val2["Cores"]):
df_val1_time = df_val1[df_val1["Cores"] == cores]["time_per_step"]
df_val2_time = df_val2[df_val2["Cores"] == cores]["time_per_step"]
df_val1_avg = np.average(df_val1_time.to_numpy())
df_val2_avg = np.average(df_val2_time.to_numpy())
kpoints_increase.append(float(df_val1_avg) / float(df_val2_avg))
return np.average(kpoints_increase)
data_eagle = data_scaled_elec[data_scaled_elec["HPC System"] == "Eagle"]
In this study, the ESIF VASP Benchmarks 1 and 2 were used. Benchmark 1 is a system of 16 atoms, and Benchmark 2 is a system of 519 atoms. Benchmark 2 was used to explore differences in runtimes between running on half-filled and full nodes as well as runtime improvements due to running on Eagle's GPU nodes. Since Benchmark 1 represents a smaller system and requires less computational time to run, all Benchmark 1 calculations were run using 4x4x2 and 10x10x5 kpoints grids in order to measure kpoints scaling and parallelization on Eagle. Additionally, Benchmark 1 was run with 8 different KPAR and NPAR configurations in the INCAR file in order to explore the efficiency of various VASP parallization schemes on Eagle. Both Benchmarks were used to compare runtimes between IntelMPI and OpenMPI and across various cpu-bind settings.
Average scaling rates between two parameter settings were calculated by first calculating the average runtime at every core count for each setting, calculating the percent difference between the two settings at each core count, and then averaging the set of percentages across all core counts. Since differences in runtime tend to be larger at higher core counts, this was done to give each core count equal weight in the overall average.
Cores per Node:
Running on half-full nodes yields better runtime per core used. Using Benchmark 2 on Eagle, running on half-full nodes used an average of 81% of the runtime used to run on full nodes with the same total number of cores.
Using GPUs:
Running on two GPUs per node significantly inproves runtime performance using Benchmark 2. Reference graph to see the extent of improvement.
MPI:
Based on average runtimes over all node counts, there is little difference between running with Intel MPI or OpenMPI. However, for both Benchmark 1 and Benchmark 2, the graphs show that using OpenMPI may improve runtimes for multi-node jobs on full nodes. OpenMPI does not improve runtimes for multi-node jobs on half-full nodes.
cpu-bind:
Best cpu-bind performance by calculation type:
- Full nodes: Based on average runtimes over all node counts, --cpu-bind doesn't have much of an effect on results. However, the graphs show that setting --cpu-bind=rank may improve runtimes for jobs on higher node counts (4+ nodes) using both Benchmark 1 and Benchmark 2.
- Half-filled nodes: --cpu-bind=cores runs in an average of 92% percent of the time as calculations with no cpu-bind set using Benchmark 1. The runtime improvement is consistent across all node counts.
- GPU nodes: Setting --cpu-bind=[either rank or cores] yields worse runtimes than having no --cpu=bind setting, using Benchmark 1.
KPOINTS Scaling:
Benchmark 1 was run using both a 4x4x2 kpoints grid (32 kpoints) and a 10x10x5 kpoints grid (500 kpoints). All Benchmark 1 calculations were run using full nodes. We should expect the runtime to scale proportionally to the change in kpoints, so we would expect the 4x4x2 kpoints grid calculations to run in 6.4% (32/500) of the amount of time needed to run the the 10x10x5 kpoints grid calculations. However, we found that the 4x4x2 kpoints grid calculations ran, on average, in 26.19% of the amount of time needed to run the 10x10x5 kpoints grid calculations. In fact, using the best performing values of KPAR and NPAR, the 4x4x2 kpoints grid calculations ran in 41% of the amount of time needed to run the 10x10x5 kpoints grid calculations. Overall, we found that using a smaller kpoints grid does not yield the expected decreases in runtime.
For each combination of KPAR and NPAR, the table below gives the average amount of the time needed to run each 4x4x2 calulcation expressed as a percentage of the time needed to run the corresponding 10x10x5 calculations. The default KPAR/NPAR configuration (KPAR=1, NPAR=# of cores) yields the best KPOINTS scaling, but the slowest overall runtimes.
KPAR and NPAR:
All KPAR/NPAR results are from Benchmark 1 calculations on full nodes. For each combination of KPAR and NPAR used, the "Comparison to Default" columnns in the table below gives the average amount of the time needed to run using the given KPAR/NPAR configuration expressed as a percentage of the time needed to run the corresponding calculations with the default KPAR/NPAR settings (KPAR=1, NPAR=# of cores). Seperate averages were done for calculations with 4x4x2 kpoint grids and those with 10x10x5 kpoints grids. Based on average runtimes across all core counts, KPAR=4, NPAR=sqrt(#of cores) is the best performing configuration of KPAR and NPAR, followed by the other two configurations with KPAR=4. However, the extent to which the runtime improves as core count increase is lost in taking the average - KPAR=9, NPAR=4 is the configuration that reaches the fastest runtime at high core counts.
Configurations that don't perform as well at higher node/core counts:
- KPAR = 1, NPAP = 4
- KPAR = 1, NPAR = # of cores
- KPAR = 1, NPAR = sqrt(# of cores)
Configurations that do perform pretty well on higher node/core counts:
- KPAR = 4, NPAR = # of cores
- KPAR = 9, NPAR =4
- KPAR = 4, NPAR = sqrt(# of cores)
- KPAR = 4, NPAR = 4
- KPAR = 9, NPAR = 9
| Average 4x4x2 Runtime as a Percentage of 10x10x5 Runtime | Average Runtime as a Percentage of default KPAR/NPAR Configuration Runtime (4x4x2) | Average Runtime as a Percentage of default KPAR/NPAR Configuration Runtime (10x10x5) | |
|---|---|---|---|
| KPAR=1,NPAR=4 | 21.59% | 64.31% | 71.27% |
| KPAR=1,NPAR=# of cores | 20.15% | Default | Default |
| KPAR=1,NPAR=sqrt(# of cores) | 26.15% | 78.48% | 69.92% |
| KPAR=4,NPAR=# of cores | 26.28% | 36.61% | 40.97% |
| KPAR=9,NPAR=4 | 41.41% | 70.91% | 58.44% |
| KPAR=4,NPAR=sqrt(# of cores) | 22.62% | 35.38% | 56.50% |
| KPAR=4,NPAR=4 | 23.83% | 36.29% | 42.71% |
| KPAR=9,NPAR=sqrt(# of cores) | 27.51% | 71.88% | 59.42% |
| Average | 26.19% |
data_eagle_2 = data_eagle[data_eagle["Benchmark Code"] == "2"]
data_eagle_2_no_gpu = data_eagle_2[data_eagle_2["Partition"] != "gpu"]
fig = px.scatter(
data_eagle_2_no_gpu,
x="Cores",
y="scaled_rate",
color="node_fill",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle",
)
fig.show()
proportion = get_scaling(data_eagle_2_no_gpu, "node_fill", "half", "full")
print(
"Using the same number of cores, calculations on half-filled nodes run in an average of",
proportion,
"the amount of time as calculations on full nodes.",
)
Using the same number of cores, calculations on half-filled nodes run in an average of 0.8121484814769462 the amount of time as calculations on full nodes.
fig = px.scatter(
data_eagle_2,
x="Cores",
y="scaled_rate",
color="node_fill",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle",
)
fig.show()
data_eagle_2_full = data_eagle_2[data_eagle_2["node_fill"] == "full"]
data_eagle_2_half = data_eagle_2[data_eagle_2["node_fill"] == "half"]
data_eagle_2_gpu = data_eagle_2[data_eagle_2["Partition"] == "gpu"]
fig = px.scatter(
data_eagle_2_full,
x="Cores",
y="scaled_rate",
color="MPI",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (Full Nodes)",
)
fig.show()
proportion = get_scaling(data_eagle_2_full, "MPI", "openmpi", "intel_impi")
print(
"On full nodes, calculations using openmpi run in an average of",
proportion,
"the amount of time as calculations using intel mpi.",
)
On full nodes, calculations using openmpi run in an average of 1.0056527077811492 the amount of time as calculations using intel mpi.
fig = px.scatter(
data_eagle_2_half,
x="Cores",
y="scaled_rate",
color="MPI",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (Half-filled Nodes)",
)
fig.show()
proportion = get_scaling(data_eagle_2_half, "MPI", "openmpi", "intel_impi")
print(
"On full nodes, calculations using openmpi run in an average of",
proportion,
"the amount of time as calculations using intel mpi.",
)
On full nodes, calculations using openmpi run in an average of 1.0046683320952752 the amount of time as calculations using intel mpi.
fig = px.scatter(
data_eagle_2_gpu,
x="Cores",
y="scaled_rate",
color="MPI",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (GPUs)",
)
fig.show()
proportion = get_scaling(data_eagle_2_gpu, "MPI", "openmpi", "intel_impi")
print(
"On full nodes, calculations using openmpi run in an average of",
proportion,
"the amount of time as calculations using intel mpi.",
)
On full nodes, calculations using openmpi run in an average of 0.9931735233077907 the amount of time as calculations using intel mpi.
fig = px.scatter(
data_eagle_2_full,
x="Cores",
y="scaled_rate",
color="cpu-bind",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (Full Nodes)",
)
fig.show()
proportion_rank = get_scaling(data_eagle_2_full, "cpu-bind", "rank", "none")
print(
"On average, calculations on full nodes with --cpu-bind=rank run in",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
proportion_cores = get_scaling(data_eagle_2_full, "cpu-bind", "cores", "none")
print(
"\nOn average, calculations on full nodes with --cpu-bind=cores run in",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, calculations on full nodes with --cpu-bind=rank run in 0.9604925132123258 the amount of time as calculations without --cpu-bind. On average, calculations on full nodes with --cpu-bind=cores run in 1.1009252330938635 the amount of time as calculations without --cpu-bind.
fig = px.scatter(
data_eagle_2_half,
x="Cores",
y="scaled_rate",
color="cpu-bind",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (Half-filled Nodes)",
)
fig.show()
proportion_rank = get_scaling(data_eagle_2_half, "cpu-bind", "rank", "none")
print(
"On average, calculations on half-filled nodes with --cpu-bind=rank run in",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
proportion_cores = get_scaling(data_eagle_2_half, "cpu-bind", "cores", "none")
print(
"\nOn average, calculations on half-filled nodes with --cpu-bind=cores run in",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, calculations on half-filled nodes with --cpu-bind=rank run in 1.1625471216271008 the amount of time as calculations without --cpu-bind. On average, calculations on half-filled nodes with --cpu-bind=cores run in 0.9182058959786629 the amount of time as calculations without --cpu-bind.
fig = px.scatter(
data_eagle_2_gpu,
x="Cores",
y="scaled_rate",
color="cpu-bind",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Eagle (GPUs)",
)
fig.show()
proportion_rank = get_scaling(data_eagle_2_gpu, "cpu-bind", "rank", "none")
print(
"On average, calculations on gpu nodes with --cpu-bind=rank run in",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
proportion_cores = get_scaling(data_eagle_2_gpu, "cpu-bind", "cores", "none")
print(
"\nOn average, calculations on gpu nodes with --cpu-bind=cores run in",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, calculations on gpu nodes with --cpu-bind=rank run in 1.0505241809518926 the amount of time as calculations without --cpu-bind. On average, calculations on gpu nodes with --cpu-bind=cores run in 1.0065700737595995 the amount of time as calculations without --cpu-bind.
print(get_scaling(data_eagle_2_gpu, "cpu-bind", "cores", "none"))
print(get_scaling(data_eagle_2_gpu, "cpu-bind", "rank", "none"))
1.0065700737595995 1.0505241809518926
Because it is smalled than Benchmark 2, Benchmark 1 was used to explore kpoints scaling as well as changes in performance due to the KPAR and NPAR tags.
data_eagle_1 = data_eagle[data_eagle["Benchmark Code"] == "1"]
To get a good idea of scaling between calculations with 4x4x2 kpoints grids and 10x10x5 kpoints grids, look at the graphs in the following section. Each graph is followed by an average kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. To view a table of results, see Recommendations for Running VASP on Eagle.
data_eagle_1_K1 = data_eagle_1[data_eagle_1["KPAR"] == 1]
data_eagle_1_K4 = data_eagle_1[data_eagle_1["KPAR"] == 4]
data_eagle_1_K9 = data_eagle_1[data_eagle_1["KPAR"] == 9]
data_eagle_1_K1_N4 = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "4"]
data_eagle_1_K1_ND = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "D"]
data_eagle_1_K1_Nsqrt = data_eagle_1_K1[data_eagle_1_K1["NPAR"] == "sqrt"]
data_eagle_1_K4_ND = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "D"]
data_eagle_1_K9_N4 = data_eagle_1_K9[data_eagle_1_K9["NPAR"] == "4"]
data_eagle_1_K4_Nsqrt = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "sqrt"]
data_eagle_1_K4_N4 = data_eagle_1_K4[data_eagle_1_K4["NPAR"] == "4"]
data_eagle_1_K9_Nsqrt = data_eagle_1_K9[data_eagle_1_K9["NPAR"] == "sqrt"]
# define a new function that alters the get_scaling function to compare two KPAR/NPAR configurations
def find_nearest(array, value):
array = [int(i) for i in array]
array = np.asarray(array)
idx = (np.abs(array - value)).argmin()
return array[idx]
def get_scaling_KN(df1, df2, kpoints):
kpoints_increase = []
df1 = df1[df1["kpoints"] == kpoints]
df2 = df2[df2["kpoints"] == kpoints]
for cores_1 in np.unique(df1["Cores"]):
cores_2 = find_nearest(np.unique(df2["Cores"]), cores_1)
if np.abs(cores_2 - cores_1) < 3:
df1_time = df1[df1["Cores"] == cores_1]["time_per_step"]
df2_time = df2[df2["Cores"] == cores_2]["time_per_step"]
df1_avg = np.average(df1_time.to_numpy())
df2_avg = np.average(df2_time.to_numpy())
kpoints_increase.append(float(df1_avg) / float(df2_avg))
return np.average(kpoints_increase)
fig = px.scatter(
data_eagle_1_K1_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=1, NPAR=4",
)
fig.show()
proportion = get_scaling(data_eagle_1_K1_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.2158856898471929 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K1_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_eagle_1_K1_N4, data_eagle_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 ran in",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 ran in",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 ran in 0.6430988260472635 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 ran in 0.7126574907286309 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K1_ND,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=1, NPAR=# of cores",
)
fig.show()
proportion = get_scaling(data_eagle_1_K1_ND, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.2014703092116728 the amount of time as the 10x10x5 kpoints grid calculations
fig = px.scatter(
data_eagle_1_K1_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=1, NPAR=sqrt",
)
fig.show()
proportion = get_scaling(data_eagle_1_K1_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.2615582479340918 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K1_Nsqrt, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_eagle_1_K1_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in 0.7847840621014068 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) ran in 0.6991603785432458 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K4_ND,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Nodes",
"Partition",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=4, NPAR=# of cores",
)
fig.show()
proportion = get_scaling(data_eagle_1_K4_ND, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.2627675712217573 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K4_ND, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_eagle_1_K4_ND, data_eagle_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in 0.3661494297317568 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=# of cores ran in 0.40966266250690236 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K9_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=9, NPAR=4",
)
fig.show()
proportion = get_scaling(data_eagle_1_K9_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.41407912054742224 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K9_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_eagle_1_K9_N4, data_eagle_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.7090692212812519 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.5844413364433881 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K4_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=4, NPAR=sqrt",
)
fig.show()
proportion = get_scaling(data_eagle_1_K4_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.22617692814102044 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K4_Nsqrt, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_eagle_1_K4_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in)",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in)",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) 0.35381217524785147 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores ran in) 0.5649568803141125 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K4_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=4, NPAR=4",
)
fig.show()
proportion = get_scaling(data_eagle_1_K4_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.23828639659359566 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K4_N4, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_eagle_1_K4_N4, data_eagle_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4 0.3628997338251977 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4 0.42714837712868303 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_eagle_1_K9_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Eagle: KPAR=9, NPAR=9",
)
fig.show()
proportion = get_scaling(data_eagle_1_K9_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.2751181088121913 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_eagle_1_K9_Nsqrt, data_eagle_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_eagle_1_K9_Nsqrt, data_eagle_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores)",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores)",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.718825197761971 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.5942041449053749 the amount of time needed for calculations with the default KPAR/NPAR settings.
Now let's take the best performing combination of KPAR and NPAR to continue the analysis: KPAR=9, NPAR=4
fig = px.scatter(
data_eagle_1_K9_N4,
x="Cores",
y="scaled_rate",
color="MPI",
symbol="kpoints",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 1 on Eagle",
)
fig.show()
proportion = get_scaling(data_eagle_1_K9_N4, "MPI", "openmpi", "intel_impi")
print(
"Calculations using openmpi run in an average of",
proportion,
"the amount of time as calculations using intel mpi.",
)
Calculations using openmpi run in an average of 1.0254375511756568 the amount of time as calculations using intel mpi.
fig = px.scatter(
data_eagle_1_K9_N4,
x="Cores",
y="scaled_rate",
color="cpu-bind",
symbol="kpoints",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 1 on Eagle",
)
fig.show()
proportion_rank = get_scaling(data_eagle_1_K9_N4, "cpu-bind", "rank", "none")
proportion_cores = get_scaling(data_eagle_1_K9_N4, "cpu-bind", "cores", "none")
print(
"Calculations using --cpu-bind=rank run in an average of",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
print(
"\nCalculations using --cpu-bind=cores run in an average of",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
Calculations using --cpu-bind=rank run in an average of 0.9701700624438857 the amount of time as calculations without --cpu-bind. Calculations using --cpu-bind=cores run in an average of 1.1259091578496088 the amount of time as calculations without --cpu-bind.
data_swift = data_scaled_elec[data_scaled_elec["HPC System"] == "Swift"]
In this study, the ESIF VASP Benchmarks 1 and 2 were used. Benchmark 1 is a system of 16 atoms, and Benchmark 2 is a system of 519 atoms. Benchmark 2 was used to explore differences in runtimes between running on half-filled and full nodes. Since Benchmark 1 represents a smaller system and requires less computational time to run, all Benchmark 1 calculations were run using 4x4x2 and 10x10x5 kpoints grids in order to measure kpoints scaling and parallelization on Swift. Additionally, Benchmark 1 was run with 8 different KPAR and NPAR configurations in the INCAR file in order to explore the efficiency of various VASP parallization schemes on Swift. Both Benchmarks were used to compare runtimes between IntelMPI and OpenMPI and across various cpu-bind settings.
Average scaling rates between two parameter settings were calculated by first calculating the average runtime at every core count for each setting, calculating the percent difference between the two settings at each core count, and then averaging the set of percentages across all core counts. Since differences in runtime tend to be larger at higher core counts, this was done to give each core count equal weight in the overall average.
Cores per Node:
Cores per Node performance for Intel MPI and OpenMPI seperately since they perform drastically different on Swift:
- Using Intel MPI: Running on half-full nodes yields better runtime per core used. Using Benchmark 2 on Swift with Intel MPI, running on half-full nodes used an average of 83.73% of the runtime used to run on full nodes with the same total number of cores.
- Using OpenMPI: Running on half-full nodes yields better runtime per core used. Using Benchmark 2 on Swift with OpenMPI, running on half-full nodes used an average of 74.60% of the runtime used to run on full nodes with the same total number of cores.
MPI:
Best MPI performance by calculation type:
- Full nodes: Calculations on full nodes run with Intel MPI have significantly faster runtimes than calculations run with OpenMPI using both Benchmark 1 and Benchmark 2 on Swift. For Benchmark 2, Intel MPI calculations run in an average of 52.41% of the time needed for OpenMPI calculations. For Benchmark 1, Intel MPI calculations run in an average of 84.49% of the time needed for OpenMPI calculations using 1 4x4x2 kpoints grid, and Intel MPI calculations run in an average of 85.82% of the time needed for OpenMPI calculations using 1 10x10x5 kpoints grid.
- Half-filled nodes: Calculations on half-filled nodes run with Intel MPI have significantly faster runtimes than calculations run with OpenMPI using both Benchmark 1 and Benchmark 2 on Swift. For Benchmark 2, Intel MPI calculations run in an average of 71.77% of the time needed for OpenMPI calculations.
cpu-bind:
Best cpu-bind performance by calculation type:
- Full nodes: On average, calculations with --cpu-bind=rank run in 88.53% of the time as calculations with no --cpu-bind using Benchmark 2, but --cpu-bind did not affect runtimes using Benchmark 1.
- Half-filled nodes: Setting --cpu-bind=[either rank or cores] yields much slower runtimes than setting no --cpu-bind using Benchmark 2.
KPOINTS Scaling:
Benchmark 1 was run using both a 4x4x2 kpoints grid (32 kpoints) and a 10x10x5 kpoints grid (500 kpoints). All Benchmark 1 calculations were run using full nodes. We should expect the runtime to scale proportionally to the change in kpoints, so we would expect the 4x4x2 kpoints grid calculations to run in 6.4% (32/500) of the amount of time needed to run the the 10x10x5 kpoints grid calculations. However, we found that the 4x4x2 kpoints grid calculations ran, on average, in 23.03% of the amount of time needed to run the 10x10x5 kpoints grid calculations. In fact, using the best performing values of KPAR and NPAR, the 4x4x2 kpoints grid calculations ran in 34.10% of the amount of time needed to run the 10x10x5 kpoints grid calculations. Overall, we found that using a smaller kpoints grid does not yield the expected decreases in runtime.
For each combination of KPAR and NPAR, the table below gives the average amount of the time needed to run each 4x4x2 calulcation expressed as a percentage of the time needed to run the corresponding 10x10x5 calculations. The default KPAR/NPAR configuration (KPAR=1, NPAR=# of cores) yields one of the best KPOINTS scaling, but the slowest overall runtimes.
KPAR and NPAR:
All KPAR/NPAR results are from Benchmark 1 calculations on full nodes. For each combination of KPAR and NPAR used, the table below gives the average amount of the time needed to run using the given KPAR/NPAR configuration expressed as a percentage of the time needed to run the corresponding calculations with the default KPAR/NPAR settings (KPAR=1, NPAR=# of cores). Seperate averages were done for calculations with 4x4x2 kpoint grids and those with 10x10x5 kpoints grids.
Based on average runtimes across all core counts, KPAR=9, NPAR=4 and KPAR=4, NPAR=4 have the best runtimes. However, the KPAR=9, NPAR=4 is the only one with fast runtimes at high core counts. All other configurations see runtimes increase as core counts increase, which is not expected. KPAR=9, NPAR=4 performs worse than all three of the KPAR=4 configurations on low core counts (1 or 2 nodes). For calculations on lower node counts, the KPAR=4, NPAR=4 configuration has the fastest runtimes.
Configurations that perform best at lower node/core counts (1 or 2 nodes):
- KPAR = 4, NPAR = # of cores
- KPAR = 4, NPAR = sqrt(# of cores)
- KPAR = 4, NPAR = 4
Configurations that perform best on higher node/core counts (3+ nodes):
- KPAR = 9, NPAR =4
| Average 4x4x2 Runtime as a Percentage of 10x10x5 Runtime | Average Runtime as a Percentage of default KPAR/NPAR Configuration Runtime (4x4x2) | Average Runtime as a Percentage of default KPAR/NPAR Configuration Runtime (10x10x5) | |
|---|---|---|---|
| KPAR=1,NPAR=4 | 19.75% | 50.90% | 60.28% |
| KPAR=1,NPAR=# of cores | 18.76% | Default | Default |
| KPAR=1,NPAR=sqrt(# of cores) | 18.56% | 89.92% | 89.36% |
| KPAR=4,NPAR=# of cores | 20.46% | 31.55% | 35.68% |
| KPAR=9,NPAR=4 | 34.10% | 37.86% | 24.23% |
| KPAR=4,NPAR=sqrt(# of cores) | 19.94% | 37.25% | 43.02% |
| KPAR=4,NPAR=4 | 20.70% | 29.76% | 33.77% |
| KPAR=9,NPAR=sqrt(# of cores) | 31.93% | 34.96% | 50.81% |
| Average | 23.03% |
data_swift_2 = data_swift[data_swift["Benchmark Code"] == "2"]
fig = px.scatter(
data_swift_2,
x="Cores",
y="scaled_rate",
color="node_fill",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Swift",
)
fig.show()
Calculating runtime statistics to compare half-full and full nodes
Since there is such a big difference between Intel MPI and OpenMPI performance, we'll split the data by MPI before calculating average rate increases for half-filled nodes
data_swift_2_impi = data_swift_2[data_swift_2["MPI"]=="intel_impi"]
data_swift_2_openmpi = data_swift_2[data_swift_2["MPI"]=="openmpi"]
proportion_impi = get_scaling(data_swift_2_impi, "node_fill", "half", "full")
proportion_openmpi = get_scaling(data_swift_2_openmpi, "node_fill", "half", "full")
print("Intel MPI: On average, calculations running on half-filled nodes run in ",proportion_impi,"the amount of time as calculations running on full nodes with same number of cores")
print("\nOpenMPI: On average, calculations running on half-filled nodes run in ",proportion_openmpi,"the amount of time as calculations running on full nodes with same number of cores")
Intel MPI: On average, calculations running on half-filled nodes run in 0.8372589810827744 the amount of time as calculations running on full nodes with same number of cores OpenMPI: On average, calculations running on half-filled nodes run in 0.7460202698705677 the amount of time as calculations running on full nodes with same number of cores
data_swift_2_full = data_swift_2[data_swift_2["node_fill"] == "full"]
data_swift_2_half = data_swift_2[data_swift_2["node_fill"] == "half"]
fig = px.scatter(
data_swift_2_full,
x="Cores",
y="scaled_rate",
color="MPI",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Swift (Full Nodes)",
)
fig.show()
proportion = get_scaling(data_swift_2_full, "MPI", "intel_impi", "openmpi")
print(
"On average, calculations running on full nodes using Intel MPI run in",
proportion,
"the amount of time as calculations using OpenMPI",
)
On average, calculations running on full nodes using Intel MPI run in 0.5240630467696896 the amount of time as calculations using OpenMPI
fig = px.scatter(
data_swift_2_half,
x="Cores",
y="scaled_rate",
color="MPI",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Swift (Half Nodes)",
)
fig.show()
proportion = get_scaling(data_swift_2_half, "MPI", "intel_impi", "openmpi")
print(
"On average, calculations running on half-filled nodes using Intel MPI run in",
proportion,
"the amount of time as calculations using OpenMPI",
)
On average, calculations running on half-filled nodes using Intel MPI run in 0.7177494200819271 the amount of time as calculations using OpenMPI
fig = px.scatter(
data_swift_2_full,
x="Cores",
y="scaled_rate",
color="cpu-bind",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Swift (Full Nodes)",
)
fig.show()
proportion_rank = get_scaling(data_swift_2_full, "cpu-bind", "rank", "none")
print(
"On average, calculations on full nodes with --cpu-bind=rank run in",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
proportion_cores = get_scaling(data_swift_2_full, "cpu-bind", "cores", "none")
print(
"\nOn average, calculations on full nodes with --cpu-bind=cores run in",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, calculations on full nodes with --cpu-bind=rank run in 0.885298044547332 the amount of time as calculations without --cpu-bind. On average, calculations on full nodes with --cpu-bind=cores run in 0.9323764171503093 the amount of time as calculations without --cpu-bind.
fig = px.scatter(
data_swift_2_half,
x="Cores",
y="scaled_rate",
color="cpu-bind",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 2 on Swift (Half Nodes)",
)
fig.show()
proportion_rank = get_scaling(data_swift_2_half, "cpu-bind", "rank", "none")
print(
"On average, calculations on half-filled nodes with --cpu-bind=rank run in",
proportion_rank,
"the amount of time as calculations without --cpu-bind.",
)
proportion_cores = get_scaling(data_swift_2_half, "cpu-bind", "cores", "none")
print(
"\nOn average, calculations on half-filled nodes with --cpu-bind=cores run in",
proportion_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, calculations on half-filled nodes with --cpu-bind=rank run in 1.3483218840678666 the amount of time as calculations without --cpu-bind. On average, calculations on half-filled nodes with --cpu-bind=cores run in 1.1269873562506034 the amount of time as calculations without --cpu-bind.
Because it is smalled than Benchmark 2, Benchmark 1 was used to explore kpoints scaling as well as changes in performance due to the KPAR and NPAR tags.
data_swift_1 = data_swift[data_swift["Benchmark Code"] == "1"]
data_swift_1_4x4x2 = data_swift_1[data_swift_1["kpoints"] == "4x4x2"]
data_swift_1_10x10x5 = data_swift_1[data_swift_1["kpoints"] == "10x10x5"]
To get a good idea of scaling between calculations with 4x4x2 kpoints grids and 10x10x5 kpoints grids, look at the graphs in the following section. Each graph is followed by an average kpoints scaling for the given KPAR/NPAR configuration, which gives the average amount of time needed to run a 4x4x2 calculation expressed as a percentage of the time needed to run a 10x10x5 calculation on the same number of cores. To view a table of results, see Recommendations for Running VASP on Swift.
data_swift_1_K1 = data_swift_1[data_swift_1["KPAR"] == 1]
data_swift_1_K4 = data_swift_1[data_swift_1["KPAR"] == 4]
data_swift_1_K9 = data_swift_1[data_swift_1["KPAR"] == 9]
data_swift_1_K1_N4 = data_swift_1_K1[data_swift_1_K1["NPAR"] == "4"]
data_swift_1_K1_ND = data_swift_1_K1[data_swift_1_K1["NPAR"] == "D"]
data_swift_1_K1_Nsqrt = data_swift_1_K1[data_swift_1_K1["NPAR"] == "sqrt"]
data_swift_1_K4_ND = data_swift_1_K4[data_swift_1_K4["NPAR"] == "D"]
data_swift_1_K9_N4 = data_swift_1_K9[data_swift_1_K9["NPAR"] == "4"]
data_swift_1_K4_Nsqrt = data_swift_1_K4[data_swift_1_K4["NPAR"] == "sqrt"]
data_swift_1_K4_N4 = data_swift_1_K4[data_swift_1_K4["NPAR"] == "4"]
data_swift_1_K9_Nsqrt = data_swift_1_K9[data_swift_1_K9["NPAR"] == "sqrt"]
fig = px.scatter(
data_swift_1_K1_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=1, NPAR=4",
)
fig.show()
proportion = get_scaling(data_swift_1_K1_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.1975141522550828 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K1_N4, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_swift_1_K1_N4, data_swift_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=4 0.5090346292174364 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=4 0.6028076724156124 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K1_ND,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Nodes",
"Partition",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=1, NPAR=# of cores",
)
fig.show()
proportion = get_scaling(data_swift_1_K1_ND, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.187687779611522 the amount of time as the 10x10x5 kpoints grid calculations
fig = px.scatter(
data_swift_1_K1_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=1, NPAR=sqrt",
)
fig.show()
proportion = get_scaling(data_swift_1_K1_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.18557856687555285 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K1_Nsqrt, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_swift_1_K1_Nsqrt, data_swift_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores)",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores)",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) 0.8992932111437968 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=1, NPAR=sqrt(# of cores) 0.8936429988792207 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K4_ND,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=4, NPAR=# of cores",
)
fig.show()
proportion = get_scaling(data_swift_1_K4_ND, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.20460889560586515 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K4_ND, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_swift_1_K4_ND, data_swift_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=# of cores",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt# of cores",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=# of cores 0.3155517478601654 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt# of cores 0.35680104665469237 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K9_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=9, NPAR=4",
)
fig.show()
proportion = get_scaling(data_swift_1_K9_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.3409966806834724 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K9_N4, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_swift_1_K9_N4, data_swift_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=4 0.37857221589965856 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=4 0.24285269584083571 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K4_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=4, NPAR=sqrt",
)
fig.show()
proportion = get_scaling(data_swift_1_K4_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.19944158617005048 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K4_Nsqrt, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_swift_1_K4_Nsqrt, data_swift_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores)",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores)",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores) 0.3724752313459252 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=sqrt(# of cores) 0.4301798105777811 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K4_N4,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=4, NPAR=4",
)
fig.show()
proportion = get_scaling(data_swift_1_K4_N4, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.20699084255595146 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K4_N4, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(data_swift_1_K4_N4, data_swift_1_K1_ND, "10x10x5")
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=4, NPAR=4 0.29764560579057525 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=4, NPAR=4 0.33767965209899 the amount of time needed for calculations with the default KPAR/NPAR settings.
fig = px.scatter(
data_swift_1_K9_Nsqrt,
x="Cores",
y="scaled_rate",
color="kpoints",
symbol="HPC System",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"Rate": "Rate(s)"},
title="Benchmark 1 on Swift: KPAR=9, NPAR=sqrt",
)
fig.show()
proportion = get_scaling(data_swift_1_K9_Nsqrt, "kpoints", "4x4x2", "10x10x5")
print(
"On average, 4x4x2 kpoints grid calculations run in ",
proportion,
"the amount of time as the 10x10x5 kpoints grid calculations",
)
On average, 4x4x2 kpoints grid calculations run in 0.3192887638021647 the amount of time as the 10x10x5 kpoints grid calculations
proportion_4x4x2 = get_scaling_KN(data_swift_1_K9_Nsqrt, data_swift_1_K1_ND, "4x4x2")
proportion_10x10x5 = get_scaling_KN(
data_swift_1_K9_Nsqrt, data_swift_1_K1_ND, "10x10x5"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores)",
proportion_4x4x2,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores)",
proportion_10x10x5,
"the amount of time needed for calculations with the default KPAR/NPAR settings.",
)
On average, using a 4x4x2 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.3496038206680978 the amount of time needed for calculations with the default KPAR/NPAR settings. On average, using a 10x10x5 kpoints grid, calculations with KPAR=9, NPAR=sqrt(# of cores) 0.5080902355201756 the amount of time needed for calculations with the default KPAR/NPAR settings.
Now let's take the best performing combination of KPAR and NPAR to continue the analysis: KPAR=9, NPAR=4
fig = px.scatter(
data_swift_1_K9_N4,
x="Cores",
y="scaled_rate",
color="MPI",
symbol="kpoints",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 1 on Swift",
)
fig.show()
data_swift_1_K9_N4_4x4x2 = data_swift_1_K9_N4[data_swift_1_K9_N4["kpoints"] == "4x4x2"]
data_swift_1_K9_N4_10x10x5 = data_swift_1_K9_N4[
data_swift_1_K9_N4["kpoints"] == "10x10x5"
]
proportion_4x4x2 = get_scaling(data_swift_1_K9_N4_4x4x2, "MPI", "intel_impi", "openmpi")
proportion_10x10x5 = get_scaling(
data_swift_1_K9_N4_10x10x5, "MPI", "intel_impi", "openmpi"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations using Intel MPI ran in",
proportion_4x4x2,
"the amount of time as calculations using OpenMPI",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations using Intel MPI ran in",
proportion_10x10x5,
"the amount of time as calculations using OpenMPI",
)
On average, using a 4x4x2 kpoints grid, calculations using Intel MPI ran in 0.8449282396277468 the amount of time as calculations using OpenMPI On average, using a 10x10x5 kpoints grid, calculations using Intel MPI ran in 0.8581872496184438 the amount of time as calculations using OpenMPI
fig = px.scatter(
data_swift_1_K9_N4,
x="Cores",
y="scaled_rate",
color="cpu-bind",
symbol="kpoints",
hover_data=[
"HPC System",
"Partition",
"Nodes",
"cpu-bind",
"MPI",
"Benchmark Code",
],
labels={"scaled_rate": "Rate(1/s)/Electronic Steps"},
title="Benchmark 1 on Swift",
)
fig.show()
proportion_4x4x2_rank = get_scaling(
data_swift_1_K9_N4_4x4x2, "cpu-bind", "rank", "none"
)
proportion_4x4x2_cores = get_scaling(
data_swift_1_K9_N4_4x4x2, "cpu-bind", "cores", "none"
)
proportion_10x10x5_rank = get_scaling(
data_swift_1_K9_N4_10x10x5, "cpu-bind", "rank", "none"
)
proportion_10x10x5_cores = get_scaling(
data_swift_1_K9_N4_10x10x5, "cpu-bind", "cores", "none"
)
print(
"On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in",
proportion_4x4x2_rank,
"the amount of time as calculations without --cpu-bind.",
)
print(
"\nOn average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in",
proportion_4x4x2_cores,
"the amount of time as calculations without --cpu-bind.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in",
proportion_10x10x5_rank,
"the amount of time as calculations without --cpu-bind.",
)
print(
"\nOn average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in",
proportion_10x10x5_cores,
"the amount of time as calculations without --cpu-bind.",
)
On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 1.0521222978842009 the amount of time as calculations without --cpu-bind. On average, using a 4x4x2 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 1.0129159842027768 the amount of time as calculations without --cpu-bind. On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=rank run in 1.092461620962752 the amount of time as calculations without --cpu-bind. On average, using a 10x10x5 kpoints grid, calculations on half-filled nodes with --cpu-bind=cores run in 0.9975326120099117 the amount of time as calculations without --cpu-bind.
os.system("jupyter nbconvert --execute --to html VASP_Recommendations.ipynb")
30720